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Abstract 

The most bacteria-like mitochondrial genome known is that of the jakobid flagellate Redinomonas americana NZ. This genome also 
encodes the largest known gene set among mitochondrial DNAs (mtDNAs), including the RNA subunit of RNase P (transfer RNA 
processing), a reduced form of transfer-messenger RNA (translational control), and a four-subunit bacteria-like RNA polymerase, 
which in other eukaryotes is substituted by a nucleus-encoded, single-subunit, phage-like enzyme. Further, protein-coding genes are 
preceded by potential Shine-Dalgarno translation initiation motifs. Whether similarly ancestral mitochondrial characters also exist in 
relatives of R. americana NZ is unknown. Here, we report a comparative analysis of nine mtDNAs from five distant jakobid genera: 
Andalucia, Histiona, Jakoba, Redinomonas, and Seculamonas. We find that Andaluda godoyi has an even larger mtDNA gene 
complement than R. americana NZ. The extra genes are rpl35 (a large subunit mitoribosomal protein) and cox15 (involved in 
cytochrome oxidase assembly), which are nucleus encoded throughout other eukaryotes. Andalucia cox15 is strikingly similar to 
its homolog in the free-living a-proteobacterium Tistrella mobilis. Similarly, a long, highly conserved gene cluster in jakobid mtDNAs, 
which is a clear vestige of prokaryotic operons, displays a gene order more closely resembling that in free-living oc-proteobacteria than 
in Rickettsiales species. Although jakobid mtDNAs, overall, are characterized by bacteria-like features, they also display a few 
remarkably divergent characters, such as 3'-tRNA editing in Seculamonas ecuadoriensis and genome linearization in Jakoba libera. 
Phylogenetic analysis with mtDNA-encoded proteins strongly supports monophyly of jakobids with Andalucia as the deepest diver- 
gence. However, it remains unclear which a-proteobacterial group is the closest mitochondrial relative. 

Key words: complete mtDNA sequences, genome evolution, gene migration to nucleus, excavates. 



Introduction 

Mitochondria are organelles of a-proteobacterial origin that 
contribute to ATP and metabolite production in the eukaryotic 
cell and typically contain their own mitochondrial DNA 
(mtDNA). The evolutionary transformation of the endosymbi- 
ont to an organelle was accompanied by drastic genome 
reduction. Of the initial approximately 1,000-8,000 genes 
(estimated from the gene content of contemporary bacteria; 
National Center for Biotechnology Information (NCBI) 
Genome Database), only 0.5-1.2% are retained in mtDNAs. 
11 Domestication 11 of the endosymbiont rendered many of its 
biological functions (e.g., biotin synthesis) unnecessary, lead- 
ing to the elimination of redundant genes. A further drastic 
reduction of coding capacity resulted from massive gene 



migration from mtDNA to the nucleus. Nuclear genes 
acquired from the endosymbiont generally encode compo- 
nents of the organelle itself, such as transporters, building 
blocks of the inner membrane, metabolic enzymes, and pro- 
teins of the oxidative phosphorylation and protein synthesis 
machinery. However, the exact the number of nuclear genes 
of a-proteobacterial origin is still a matter of debate (reviewed 
in Gray etal. 2001). 

The most gene-rich mtDNA reported to date is that of 
Redinomonas americana NZ (Lang et al. 1997), which is a 
member of the jakobids. (Note that the term "jakobids" has 
been used previously to circumscribe "core-jakobids" plus 
malawimonads [e.g., Lang et al. 1999; O'Kelly and Nerad 
1999].) However, the inclusion of malawimonads in this 
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assemblage is not supported from a phylogenetic point of 
view (e.g., Rodnguez-Ezpeleta et al. 2007; Hampl et al. 
2009; Derelle and Lang 201 1). Jakobids are bacterivorous uni- 
cellular eukaryotes characterized by two f lagella, one of which 
is directed posteriorly, and a feeding groove along the 
body used for capture and ingestion of small particles and 
bacteria (Flavin and Nerad 1 993; O'Kelly 1 993). The mitochon- 
drial genome of R. americana (designated below as 
Redinomonas-94) specifies as many as 96 assigned genes in- 
cluding 65 proteins and 31 structural RNAs, plus two open 
reading frames (ORFs) longer than 100 residues (Lang et al. 
1997). Redinomonas-94 genes otherwise rarely found in 
mtDNA encode NADH dehydrogenase subunits 7-1 1 (nad7- 
nad11), succinate dehydrogenase subunits, mitoribosomal 
proteins, ABC and twin arginine transporters, and RNase 
P-RNA. Genes identified in at most one other mtDNA code 
for the Tu elongation factor A, ATP synthase subunit 3, and 
coxl 1 involved in cytochrome oxidase assembly. Exclusively 
present in Redinomonas-94 mtDNA are genes for six addi- 
tional large subunit (LSU) mitoribosomal proteins, four subu- 
nits of bacteria-type RNA-polymerase, secY specifying a 
protein transporter, and ssrA, which encodes (a reduced 
form of) transfer-messenger RNA (tmRNA) (Jacob et al. 
2004); four of the above listed genes (atp4, ssrA, tatA, and 
tatQ were not reported in the initial publication (Lang et al. 
1997) but rather detected later. In addition to the unusually 
large gene complement and genes encoding a bacteria-like 
RNA-polymerase, this mtDNA exhibits remarkably primitive 
features such as putative Shine-Dalgarno (SD) motifs and 
gene clusters closely resembling prokaryotic operons. These 
combined features prompted the apt description of the mito- 
chondrion of Redinomonas-94 as "the mitochondrion that 
time forgot" (Palmer 1997). 

The above findings raised a number of intriguing questions. 
Is Redinomonas-94 a rare exception or do other jakobids have 
similarly ancestral mtDNAs? Can we recognize evolutionary 
trends by comparing various jakobid mitochondrial genomes? 
To address these questions, we sequenced eight mtDNAs 
from jakobids belonging to the genera Andaluda (Lara et al. 
2006), Histiona (Flavin and Nerad 1993), Jakoba (Patterson 
1990), Redinomonas (Flavin and Nerad 1993), and 
Seculamonas (Edgcomb et al. 2001; O'Kelly CJ, unpublished 
data). We find that mtDNAs from all jakobids are considerably 
more eubacteria-like than mtDNA from any other eukaryote. 
Moreover, one of the jakobids that branches basally to all 
other jakobids in the mitochondrial protein-based 
phylogenetic tree has retained even more mtDNA-encoded 
genes than Redinomonas-94. 

Materials and Methods 

Strains and Culture 

The jakobid strains used in this study (listed in table 1) were 
obtained from the American Type Culture Collection (ATCC), 
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except for Andalucia godoyi, which was kindly provided by 
A. Simpson (Lara et al. 2006). The large variety of (food) 
bacteria that are present in the original strain isolates was 
reduced by repeated dilution in growth medium so as to 
retain only a few jakobid cells, and by adding to these isolates 
precultured live Enterobacter aerogenes (ATCC 13048) bacte- 
ria as food. We used WCL culture medium for R. americana 
species (ATCC 50394, 50283, 50284, 50633), Histiona 
aroides (ATCC 50634), Seculamonas ecuadoriensis (ATCC 
50688), and A. godoyi, and F/2 medium for the two jakobids 
that were isolated from marine environments, Jakoba libera 
(ATCC 50422) and 7. bahamiensis (ATCC 50695; currently not 
listed at ATCC's website). Detailed recipes for the media are 
described at http://megasun.bch.umontreal.ca/People/lang/ 
FMGP/methods.html. Cultures (500 ml) in 2.5 1 Erlenmeyer 
flasks were gently shaken at 22 °C and daily supplemented 
with live bacteria. Cells were harvested by centrifugation in 
the early stationary growth phase (after 2-5 days), when most 
food bacteria were consumed. 

Purification and Sequencing of mtDNA 

Jakobid cells were broken mechanically to extract total DNA, 
and mtDNA was isolated by CsCI-bisbenzimide equilibrium 
gradient centrifugation, based on a higher A + T content 
than nuclear DNA (Lang and Burger 2007). Random libraries 
were constructed from mtDNA that was fragmented by neb- 
ulization and then cloned and sequenced (Sanger; Licor 
sequencer) by a whole-genome shotgun approach (Lang 
and Burger 2007). Because the mtDNA preparation of 
H. aroides was considerably contaminated with nuclear DNA 
(which is nearly as A + T-rich as mtDNA), random sequencing 
was combined with sequencing of DNA amplified by long 
polymerase chain reaction. 

Genome Annotation 

Gene annotation was performed with the automated tool 
MFannot (http://megasun.bch.umontreal.ca/cgi-bin/mfannot/ 
mfannotlnterface.pl) developed in house. In brief, MFannot 
predicts group I and group II introns, tRNAs, RNase P-RNA, 
and 5S ribosomal RNA (rRNA) with Erpin as a search engine 
(Gautheret and Lambert 2001), based on RNA structural pro- 
files established by us. Exons of protein-coding genes are in- 
ferred in a first round with Exonerate (Slater and Birney 2005) 
and then for less well-conserved genes with HMMER (Eddy S; 
http://hmmer.janelia.org), based on models for all known 
mtDNA-encoded proteins. Only sequence positions that are 
aligned with confidence are retained for model construction. 
Mini-exons (as short as 3 nt) that are not resolved by Exonerate 
but inferred by the presence of orphan introns are detected as 
missing protein regions in multiple protein alignments. The 
precise placement of small exons is based on the best fit of 
Hidden Markov Model (HMM) protein profiles and on the fit 
with conserved nucleotide sequence profiles of group I or 



group II exon-intron boundaries. Genes encoding the small 
subunit (SSU) and LSU rRNAs are predicted with HMM profiles 
covering the most highly conserved domains, allowing precise 
placement of the SSU rRNA termini but only approximate po- 
sitioning of LSU rRNA ends. The latter termini, as well as the 
precise exon-intron boundaries of rRNA genes, are predicted 
manually using comparative structure modeling. In any case, 
automated annotations are complemented by manual analy- 
ses to account for MFannot warnings (e.g., potential trans- 
spliced genes, gene fusions, frame shifts, alternative transla- 
tion initiation sites, and failure to identify mini-introns), and 
find features that are not (yet) recognized by automated pro- 
cedures (e.g., tmRNA genes). Unidentified ORFs located adja- 
cent to genes that in bacteria are arranged in operons were 
examined individually using Position-Specific Iterated-Basic 
Local Alignment Search Tool (PSI-BLAST; Altschul et al. 
1997). In addition, we built HMM profiles of the positional 
counterpart in bacteria and searched all ORFs against this pro- 
file. ORFs with significant or close-to-significant sequence sim- 
ilarity were further validated or rejected by inspecting multiple 
protein alignments. 

Mitochondrial tmRNAs were searched for with a covariance 
model that was built from an alignment of previously identi- 
fied jakobid tmRNAs (Jacob et al. 2004), using the cmbuild 
and cmcalibrate tools that are included in the most recent 
implementation of Infernal (version 1.1rc1; Eddy S; http:// 
hmmer.janelia.org). Only confidently aligned nucleotides 
were used for model building and searching, by applying 
the — hand option. The model identifies the circularly per- 
muted gene sequences with high confidence (E value 
< 1 .9e- 1 2) and even recognizes the 3 r -part of the continuous 
J. libera gene (E value: 5.8e-3). 

In Silico Search for Sequence Motifs and 
Genome Signatures 

Basic sequence manipulations (formatting, reverse comple- 
mentation, and translation) were conducted with the in-house 
tool FLIP. Genie and intergenic regions of jakobid mtDNA were 
extracted using PEPPER, also developed in-house. These soft- 
ware tools are described at http://megasun.bch.umontreal.ca/ 
ogmp/ogmpid.html and are available on request. Sequence 
identities of intergenic regions were evaluated using FASTA 
(Pearson 2000). 

To locate potential SD motifs, that is, sequence motifs 
complementary to the inferred 3'-end of mitochondrial SSU 
rRNA (S'-CUCCUUUoh), we used RNA hybrid available on the 
bibiserver at http://bibiserv.techfak.uni-bielefeld.de/rnahybrid 
(Rehmsmeier et al. 2004). Hairpin elements in intergenic re- 
gions were identified with RNAalifold (Bernhart et al. 2008). 

For the prediction of the probable origins and termini of 
replication in jakobid mtDNAs, we applied the cumulative GC 
skew technique that measures the asymmetric strand distri- 
bution of G and C for individual fixed-length windows along 
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the sequence, that is (G-C)/(G + C), and then sums the 
scores values. Cumulative skew plotted along the sequence 
indicate probable origin (local minimum) and terminus (local 
maximum) of replication. For this analysis, we used the 
gc_skew implementation at the webserver http://gcat. 
davidson.edu/DGPB/gc_skew/gc_skew.html (Grigoriev 1999). 

Destabilized helical DNA regions, which are often 
associated with replication origins, promoters, and 
protein-binding sites were predicted with the SIDD tool (Bi 
and Benham 2004; Zhabinskaya and Benham 2011) (web- 
server at http://benham.genomecenter.ucdavis.edu/sibz/). 
Overlapping 10-kbp genome portions were analyzed with 
the parameters DNA_Type = linear, temperature = 31 OK, 
and Superhelical_Density= -0.055. Superhelically induced 
duplex destabilization, which facilitates or creates local sites 
of strand separation, are inferred from particular dinucleotide 
repeats and the equilibrium probability of transition from 
right-handed B to left-handed Z form of DNA. 

Potential bacteria-like promoters were predicted with 
BPROM (http://linux1 .softberry.com), which was chosen 
among various public web-accessible predictors, because it 
yields for jakobid mtDNAs a reasonable number (in the 
order of 200) of potential sites per genome, instead of 0 or 
> 1,000, obtained with other tools tested. BPROM searches 
for sequence motifs derived from validated bacterial func- 
tional sites collected in the DPInteract database (Robison 
et al. 1998). The algorithm of promoter identification is 
based on linear discriminant function that accounts for se- 
quence characteristics of promoter regions. Because the 
score is a logarithmic value with the neutral score being 0, 
any value >0 is more likely than not to be a promoter. For 
bacterial sequences, the threshold is usually set to 1 or more, 
but because jakobid mtDNA sequences are A + T-rich and 
probably produce more false positives as a result, we calcu- 
lated a specific threshold for each genome. To do so, mtDNA 
sequences were randomly shuffled with SHUFFLE DNA 
(webserver at http://bioinformatics.org/sms/), and these se- 
quences were analyzed again with BPROM. The highest 
score in the random-shuffled genome sequences was 9.24 
for A godoyi and 15.51 for Redinomonas-94- mtDNA. Hits 
with scores below these values are therefore considered 
false positives. 

Recent acquisition of foreign genome portions via horizon- 
tal transfer in J. libera mtDNA was tested with IGIPT (Jain et al. 
201 1) (webserver at http://bioinf.iiit.ac.in/IGIPT), by comparing 
the codon bias and amino acid bias of the ORFs at the termini 
of the linear chromosome (dpo, orf98, orf339, orf436, and 
orf686) with that of typical mitochondrial genes, as well as by 
scanning for changes in dinucleotide composition along the 
mtDNA. For calculation of codon bias, the average frequencies 
for codons specifying a particular amino acid are normalized, 
following which differences in codon usage of one gene set 
relative to another are determined. The amino acid bias 
is based on residue frequencies of a particular gene set 



compared with the average frequencies for the genome. 
The dinucleotide bias calculation assesses differences between 
the observed dinucleotide frequencies and those expected 
from random associations of mononucleotide frequencies. 
We set the search parameters as follows: standard devia- 
tion =1.5 for filtering results to be reported; window 
size =10,000 kb. 

Phylogenetic Analysis 

The data set contains 19 mitochondrion-encoded proteins 
(Cox1, 2, 3, Cob, Atp1, 6, 9, and Nad1, 2, 3, 4, 4L, 5, 6, 7, 
9, 10, 11, TufA) from representative (preferably slowly evolv- 
ing) taxa for which complete mtDNA sequences are available 
and from the jakobids studied here (the very closely related 
Redinomonas species are represented by Redinomonas-94). 
Protein collections were managed and automatically aligned, 
trimmed, and concatenated with Mams (developed in house; 
Lang BF and Rioux P, unpublished). Mams uses MUSCLE 
(Edgar 2004) for an initial alignment, followed by a refinement 
step with HMMalign (Eddy S; http://hmmer.janelia.org) and 
the elimination of all sequence positions with posterior prob- 
abilities lower than 1 . The final data set contained 52 taxa and 
5,791 amino acid positions (alignment is available from the 
authors upon request). 

For phylogenetic analyses by Bayesian inference 
(PhyloBayes [Lartillot and Philippe 2004]), we used the default 
CAT model, six discrete categories, four independent chains, 
14,000 cycles (corresponding to ~770,000 generations), and 
the -de parameter to remove constant sites. The first 10,000 
cycles were discarded as burn-in. The robustness of internal 
branches was evaluated based on jackknife replicates (100 at 
65%) rather than by bootstrap analysis, because the latter 
method generates duplicated sequence sites whose modeling 
is problematic with the Bayesian approach. Maximum likeli- 
hood analysis was performed with RaxML-HPC (Stamatakis 
2006) V7.2.2, using the LG model (PROTG AM M ALG F), and 
the fast bootstrapping option (100 replicates). 

Sequence Deposition in Public Repositories 

The annotated mtDNA sequences of jakobids (accession num- 
bers KC353352-59) have been deposited in GenBank. 

Results 

General Features of Mitochondrial Genomes 

Among characterized mitochondrial genomes, those in jako- 
bids are of intermediate size, ranging from 65 to 100 kbp 
(table 1). Jakobid mtDNAs are circular mapping except for 
J. libera whose mtDNA is a linear monomer. The A + T content 
of these mitochondrial genomes is moderate (64-74%), and 
the proportion of coding regions (including introns) versus 
intergenic sequence is high (80-93%). To our knowledge, 
only mtDNA from the red alga Chondrus crispus is more 
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compact, with coding sequences in that case amounting to 
approximately 96%. 

Gene Content of Mitochondrial Genomes 

Generally, mitochondrial genomes encode a small set of pro- 
teins and RNAs that are involved in oxidative phosphorylation 
("OXPHOS") and protein synthesis. In a few eukaryotes, the 
products of mtDNA-encoded genes also take part in protein 
import and maturation, respiratory complex assembly, and 
tRNA processing. Only in jakobids are mtDNA-specified 
genes implicated in transcription (RNA polymerase) and trans- 
lational quality control (tmRNAs). Table 2 provides an overview 
of jakobid mitochondrial genes and the biological processes in 
which they participate. The mitochondrial gene content of 
individual jakobids is compiled in tables 3 and 4, compared 
with that of two other gene-rich protists. 

Tables 3 and 4 show that although Andalucia mtDNA does 
not have a secY gene, it is clearly the eukaryote with the 



largest number of identified mitochondrial genes (exactly 
100), superseding Redinomonas in that regard. Only 
Andalucia mtDNA carries rpl35 (specifying a LSU ribosomal 
protein) and cox15 (whose protein product is involved in 
cytochrome oxidase assembly), genes that have relocated to 
the nucleus in all other eukaryotes. Andalucia cox15 particu- 
larly resembles its homolog in Tistrella mobilis, a free-living 
oc-proteobacterium, sharing 38% sequence identity over the 
entire protein and three specific indel signatures; sequence 
identities with other bacterial and eukaryotic homologs are 
below 28% (a multiple protein sequence alignment is 
shown in supplementary fig. S1, Supplementary Material 
online). Another remarkable feature is the presence of a 
mtDNA-encoded trnT in Andalucia, which is lacking in all 
other jakobids where it is most likely imported from the cyto- 
sol, because it is essential for translation of mtDNA-encoded 
genes. 

The jakobid with the fewest mitochondrial genes is J. libera, 
with gene losses among protein-coding and tRNA genes 



Table 2 

Genes in Jakobid mtDNA and Their Functions 



Biological Process 



Genes 3 



Electron transport and ATP synthesis 
NADH dehydrogenase (complex I) subunits 
Succinate dehydrogenase (complex II) subunits 
Cytochrome bc-i complex (complex III) subunits 
Cytochrome c oxidase (complex IV) subunits 
ATP synthase (complex V) subunits 

Translation 

SSU ribosomal proteins 
LSU ribosomal proteins 
Elongation factor 

Ribosomal RNAs (LSU rRNA, SSU rRNA, 5S rRNA) 
Transfer RNAs 

tmRNA (reversal of stalled translation) 
Transcription 

Core RNA polymerase 

Sigma-like factor 
Protein import 

ABC transporters 

Heme delivery 

SecY-type transporter 

SecY-independent transporters 
Protein maturation 

Cytochrome c oxidase assembly 

Heme c maturation 
RNA processing 

RNase P (5' -tRNA processing) 
Unknown 

DNA polymerase (plasmid derived) 



nadl, 2,3,4,41,5,6,7, 8,9,10, 1 1 

sdh2,3,4 

cob 

cox1,2,3 
atpl, 3,4,6,8,9 

rps1, 2,3,4,7,8, 10,11, 12, 13, 14, 19 

rpH, 2,5,6,10, 1 1, 14,16,18, 19,20,27, 31,32,34,35 

tufA 

ml, rns, rrn5 

trnA.Y 

ssrA 

rpoA,B,C 
rpoD 

ccmA,B 
ccmC 
secY 
tatA,C 

cox11,1_5 
ccmF 

rnpB 

dpo 



a Black, common mitochondrial genes; blue, expanded gene set, present in mtDNAs of various protists and plants; red, predominantly found in 
jakobids (rpl32 is otherwise only known from Vitis vinifera and Populus alba mt DNA [GenBank accession no. YP_002608375; BAG80685]; tufA in 
Hartmannella vermiformis mtDNA [Burger et al., GenBank accession no. GU828005]; rpl19 in Hartmannella, Malawimonas californiana, and M. jakobi- 
formis [Gray et al. 2004]; ssrA in oomycete mtDNA [Lang et al., in preparation]; and cox77 in Naegleria [Burger et al., GenBank accession no.AF288092]); 
red underline, exclusively found in jakobid mtDNAs. For a previous description of "non-standard" mitochondrial genes, see Gray et al. (2004). 
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Table 3 

Comparison of Protein-Coding Gene Sets in Jakobid mtDNAs 3 

Jakobids Cryptophyte: Streptophyte: 

Hemiselmis Chlorokybus 

Gene Andalucia Histiona Jakoba J. libera Recli-33 /?ec//-83 /?ec//-84 /?ec//-94 Seculamonas andersenii atmosphyticus 

bahamiensis 



atpl m 




■ ■ ■ ■ ■ ■ 


■ 




■ 


atp3 u 


u a 


■ ■ ■ ■ ■ ■ 


□ 




□ 


atp4 m 


a a 


■ ■ ■ ■ ■ ■ 


■ 




■ 


atp6 m 


H a 


a ■ ■ ■ ■ ■ 


■ 




■ 


atp8 u 


a ■ 


■ ■ ■ ■ ■ ■ 


■ 




■ 


atp9 m 


a ■ 


■ ■ ■ ■ ■ ■ 


■ 




■ 


ccmA u 


a ■ 


a ■ ■ ■ ■ ■ 


□ 




□ 


ccmB m 


a ■ 


■ ■ ■ ■ ■ ■ 


□ 




□ 


ccmC u 


a ■ 


a ■ ■ ■ ■ ■ 


□ 




□ 


ccmF u 


a ■ 


a ■ ■ ■ ■ ■ 


□ 




□ 


cob m 


a ■ 


■ ■ ■ ■ ■ ■ 


■ 






coxl m 


a ■ 


a ■ ■ ■ ■ ■ 


■ 




a 


cox2 m 


a ■ 


a ■ ■ ■ ■ ■ 


■ 




a 


cox3 m 


a ■ 


■ ■ ■ ■ ■ ■ 


■ 




a 


coxl 1 m 


a ■ 


■ ■ ■ ■ ■ ■ 


□ 






cox15 m 






□ 






nad1 u 


a ■ 


■ ■ ■ ■ ■ ■ 


■ 




a 


nad2 u 


a ■ 


■ ■ ■ ■ ■ ■ 


■ 




a 


nad3 m 


a ■ 


■ ■ ■ ■ ■ ■ 


■ 




a 


nad4 u 


a ■ 


■ ■ ■ ■ ■ ■ 


■ 




a 


nad4L u 


a ■ 


■ ■ ■ ■ ■ ■ 


■ 




a 


nad5 m 


a ■ 


■ ■ ■ ■ ■ ■ 


■ 




a 


nad6 m 


a ■ 


■ ■ ■ ■ ■ ■ 


■ 




a 


nad7 m 


a ■ 


■ ■ ■ ■ ■ ■ 


■ 




a 


nad8 u 


a ■ 


■ ■ ■ ■ ■ ■ 


■ 






nad9 u 


a ■ 


■ ■ ■ ■ ■ ■ 


■ 




a 


nadW u 


a ■ 


■ ■ ■ ■ ■ ■ 


■ 




a 


nad11 u 


a ■ 


■ ■ ■ ■ ■ ■ 


■ 






rpH u 


a ■ 


■ ■ ■ ■ ■ □ 


□ 






rpl2 m 


a ■ 


■ ■ ■ ■ ■ ■ 


□ 




a 


rpl5 m 


a ■ 


■ ■ ■ ■ ■ ■ 


■ 




a 


rpl6 u 


a ■ 


■ ■ ■ ■ ■ ■ 


■ 




a 


rpHO m 


a ■ 


■ ■ ■ ■ ■ ■ 


□ 




■ b 


rpl11 m 


a ■ 


■ ■ ■ ■ ■ ■ 


□ 






rpl14 u 


a ■ 


■ ■ ■ ■ ■ ■ 


■ 




a 


rpl16 m 


a ■ 


■ ■ ■ ■ ■ ■ 


■ 




a 


rpl18 u 


a ■ 


□ ■ ■ ■ ■ ■ 


□ 




□ 


rpl19 u 




□ ■ ■ ■ ■ ■ 
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Table 3 Continued 

Jakobids Cryptophyte: Streptophyte: 

Hemiselmis Chlorokybus 

Gene Andalucia Histiona Jakoba J. libera Recli-33 /?ec//-83 /?ec//-84 /?ec//-94 Seculamonas andersenii atmosphyticus 

bahamiensis 

rps4 ■■■■■■■■■ ■ ■ 

rps7 ■■■■■■■■■ ■ ■ 

rps8 ■■■■■■■■■ ■ ■ 

rpslO ■■■■■■■■■ □ ■ 

rps11 ■■■■■■■■■ ■ ■ 

rps12 ■■■■■■■■■ ■ ■ 

rps13 ■■■■■■■■■ ■ ■ 

rps14 ■■■■■■■■■ ■ ■ 

rps19 ■■■■■■■■■ ■ ■ 

sdh2 ■■■■■■■■■ □ □ 

sdh3 ■■■■■■■■■ ■ ■ 

sdh4 ■■■■■■■■■ ■ ■ 



secY ■■■■■■■■ □ □ 

tat/A ■■■■■■■■■ ■ □ 

tatC ■■■■■■■■■ ■ ■ 

ti/£4 ■■■■■■■■■ □ □ 

ORF 67 4 22 2222 4 3 10 



a Taxon abbreviations, Redi-33, Redinomonas americana-33; Redi-83, R. americana-83; Redi-84, R. americana-84; Redi-94, R. americana-94. For complete taxon names, see 
table 1. 

b Gene not annotated in GenBank record. 

c Number of ORFs. For full listing, see table 5. ORF minimum length is set at 35 residues for all jakobids. For Hemiselmis (GenBank accession no. NC_010637) and 
Chlorokybus (GenBank accession no. NC_009630), minimum ORF length is 100 residues. 



(tables 3 and 4). Interestingly, this mtDNA is also the only one 
among jakobids to possess a gene (dpo) clearly related to 
family B DNA polymerases. This gene (or pseudogene, see 
Discussion) was most likely acquired secondarily, together 
with several ORFs and tandem repeat arrays located at both 
ends of the linear mtDNA, via integration of a mobile plasmid 
into the mitochondrial genome. Horizontal transfer is corrob- 
orated by significantly different sequence signatures (dinucle- 
otide composition, codon usage, amino acid frequency, and 
G + C bias at different codon positions) in the terminal regions 
compared with those in the central portion of the molecule 
(supplementary tables S1 and S2, Supplementary Material 
online). Plasmid-mediated gene gain in mtDNA is frequently 
observed in fungal lineages (Fricova et al. 201 0 and references 
therein) but has been also reported in other organismal groups 
(e.g., Takano etal. 1997). 

In addition to the above genes of known function, jakobid 
mitochondrial genomes contain several hypothetical protein- 
coding genes (ORFs; table 5). Some of these ORFs may repre- 
sent functional genes, considering that a SD-like sequence 
motif is located 7-1 5 nt upstream of the reading frame (see 
later). Two groups of ORFs are conserved across the four 
Redinomonas mtDNAs, notably an ORF in the range of 
1 69-1 81 amino acids in length and another 71 7-746 residues 
long; we refer to these two ORF families as redi-orf169-181 
and red i-orf7 17-746. Members of these families not only 
share significant sequence similarity (26.4-84.1% identity 



over >97% of the protein's length) but are also located in 
the same synteny block (fig. 1). Interestingly, Histiona mtDNA 
contains ORFs {orf163 and orf753, referred to as hist-orf163 
and hist-orf753) of similar size and located in the same posi- 
tional context as their counterparts in Redinomonas taxa. 
However, sequence similarity is borderline; only hist-orf163 
(E value: 3.6e-5) but not hist-orf753 is detected when search- 
ing with the Redinomonas-spec\f\c HMM profiles against 
Histiona ORFs. Similarly, the search of Redinomonas 
ORF-HMM profiles against assigned mitochondrial and 
ot-proteobacterial proteins does not return hits above the re- 
porting threshold. 

Intergenic Regions, Gene Order, and Orientation 

Figure 1 depicts the gene maps of the nine jakobid mtDNAs 
described here. In the four Redinomonas strains, mitochon- 
drial gene order is identical. These genomes have significant 
sequence similarity even in intergenic regions. The highest 
values are observed between Redinomonas-33 and 
Redinomonas-84, with an average sequence identity in inter- 
genic regions of 80%. The least resemblance among 
Redinomonas strains is between Redinomonas-33 and 
Redinomonas-94, where the identity average is 61 %. In con- 
trast, mtDNAs of the two Jakoba taxa differ considerably. 
Gene order varies due to numerous transpositions and inver- 
sions, and equivalent intergenic sequences have diverged to 
an extent that has erased any detectable similarity. 
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Table 4 

Comparison of Gene Sets Coding for Structural RNAs in Jakobid mtDNAs 3 



Gene 



Jakobids 



Andalucia Histiona Jakoba J. libera Recli-33 /?ec//-83 /?ec//-84 /?ec//-94 Seculamonas 
bahamiensis 



Cryptophyte: Streptophyte: 
Hemiselmis Chlorokybus 
andersenii atmosphyticus 



(2) 



trnS(gga) m 


□ 


□ 


□ 


□ 


□ 


□ 


□ 


trnS(uga) u 


■ 


■ 


■ 
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■ 


■ 


trnT{ugu) u 


□ 


□ 


□ 


□ 


□ 


□ 


□ 


trnV(gac) m 


□ 


□ 


□ 


□ 


□ 


□ 


□ 


trnV{uac) u 


■ 


■ 


■ 


■ 


■ 


■ 


■ 


trnW{cca) u 


■ 


■ 


■ 


■ 


■ 


■ 


■ 


trnY(gua) m 


■ 


■ 


■ 


■ 


■ 


■ 


■ 



a Taxa as in table 3. 

b Plus trnN(auu). 

c Plus trnR(cgc) and trnR(gcg). 

d Plus trnR(ucg). 

instead trnTiggu). 



Despite considerable differences in gene order between 
the non-Redinomonas jakobids, eight synteny blocks of 
more than four genes are found across jakobids (fig. 2 and 
supplementary fig. S2, Supplementary Material online). 
Foremost in these clusters are genes specifying ribosomal pro- 
teins (rps and rpf), but genes encoding NADH dehydrogenase 
subunits (nacf), cytochrome c maturation proteins (ccm), and 
succinate dehydrogenase subunits (sdh) are present as well. 
Most synteny blocks are relicts of bacterial operons, and 
because the jakobid gene clusters are densely packed (with 



genes sometimes overlapping), it is likely that they represent 
polycistronic transcription units, as in bacteria. 

We compared the gene order of jakobid mtDNAs with 
that of nine diverse oc-proteobacterial genomes. Figure 2 
aligns the longest common jakobid synteny block with the 
corresponding, usually contiguous, ot-proteobacterial operons 
(L1 1 , L1 0, Beta, Str, S1 0, Spc, and Alpha). Jakobids collectively 
display not only specific deletions (e.g., rpl12) but also an in- 
sertion {nad1 1-nad1-cox1 1-cox3 inserted in the Str cluster). In 
oc-proteobacteria, nad and cox genes are typically part of 
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Table 5 

Hypothetical Protein-Coding Genes in Jakobid mtDNAs 3 



Andalucia 


Histiona 


Jakoba bahamiensis 


J. libera* 


Recli-33 


/?ec//-83 


/?ec//-84 


/?ec//-94 


Seculamonas 


orf35 


orf35 


orf41 


orf35 


orf181* 


orf178* 


orf179* 


orf169* 


orf35 


orf97 


orf36 


orf49 


orf38-1 


orf746* 


orf742* 


orf746* 


orf717* 


orf125 


orf123* 
orf146* 
orf203* 
orf240* 


orf48 

orfSO 

orf163 

orf220 

orf753 


orf138 
orf231 


orf38-2 

orf40 

orf41 

orf44 

orf46 










orf285 
orf364 



orf50 

orf51 

orf53 

orf54 

orf93 

orf98^ 

orf116 

orf164 

orf221 

orf320-1 

orf320-2 

orf328 

orf339 

orf436 

orf686 

a Number after "orf" indicates the number of amino acids contained in that ORF. ORFs of identical length but different sequence residing in the same mtDNA are 
distinguished by -1, -2, etc. ORFs occurring in the same synteny context in different mtDNAs are highlighted by shared color shading, and those with a SD-like sequence 
motif 7-1 5 nt upstream of the ATG codon (free energy < -8.5kcal for pairing with anti-SD sequence in SSU rRNA) are marked by an asterisk. 

b SD-like motifs absent from genome. 

c Two copies of identical sequence present at both ends of the linear chromosome. 



separate, larger nad and cox operons. Interestingly, the 
gene arrangement in jakobid mitochondria is overall more 
similar to that in free-living a-proteobacteria, compared with 
Rickettsia-like intracellular pathogens that share a cluster 
discontinuity (fig. 2). 

Ribosomal RNAs 

The mitochondrial SSU and LSU rRNAs of the nine jakobids are 
strikingly bacteria-like, conforming closely to the standard 
models for Escherichia coli 16S and 23S rRNA, respectively. 
Sequences share a high degree of nucleotide identity with 
each other and with their E. coli counterparts, particularly 
within the "universal core" that defines the functionally 
most critical portions of the rRNAs. In all jakobid mitochondrial 
LSU rRNAs, a 5'-5.8S rRNA-like domain and a 3'-4.5S 
rRNA-like region are readily apparent at the level of both pri- 
mary and secondary structure. Mitochondrial SSU rRNAs of 
jakobids contain a 3 r -terminal pyrimidine-rich motif that is 
complementary to purine-rich SD-like elements upstream of 
the start of many protein-coding genes in the corresponding 
mitochondrial genomes (see later). 

Overall, sequence identity ranges from approximately 70% 
to >95%, with the SSU rRNAs of the 33, 83, and 84 strains of 
R. americana being colinear and virtually identical in sequence, 



as are the LSU rRNAs. At the other extreme, for both the SSU 
and LSU mitochondrial rRNAs, the homologous Jakoba rRNAs 
are as divergent from one another in both primary and 
secondary structure as they are from their homologs in the 
other jakobids. As anticipated, length variation among the 
jakobid sequences and with the corresponding E. coli 
sequence is almost exclusively confined to highly divergent 
variable regions of secondary structure, previously identified 
from comparisons of SSU and LSU rRNA homologs. An align- 
ment of jakobid mitochondrial and E. coli SSU rRNAs is shown 
in supplementary figure S3, Supplementary Material online. 

Jakobid mtDNAs encode a recognizable 5S rRNA, which 
also conforms closely to the corresponding bacterial 5S rRNA 
in primary and secondary structure, as previously reported for 
R. americana-94 mitochondrial 5S rRNA (Lang et al. 1996). 
The four Reclinomonas mitochondrial 5S rRNA sequences 
are colinear and virtually identical in sequence, whereas 
those of the two Jakoba species exhibit substantial variation. 
The most divergent of jakobid mitochondrial 5S rRNAs is that 
of J. bahamiensis. 

Transfer RNAs 

Not all mitochondrial genomes contain a full complement of 
tRNA genes, with the missing tRNAs being imported into the 
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Fig. 1. — Gene maps of jakobid mtDNAs. Black boxes represent genes. Gene-name colors depict taxonomic gene distribution: black, common genes 
found in most animal, fungal, and protist mtDNAs; blue, expanded gene set, mostly present in protist and plant mtDNAs; red, genes predominantly found in 
jakobid mtDNAs (for details, see table 2); green, hypothetical protein-coding genes (ORFs). ORFs of identical length but different sequence are discriminated 

(continued) 



Genome Biol. Evol. 5(2):418^38. doi:10.1093/gbe/evt008 Advance Access publication January 18, 2013 



427 



Burger etal 



GBE 







.... L11 _ 


.... L10 — - 


— Beta — 




......... 


......... 


-- Str -■ 










Midi . 


spec . 


rplll 


rpll 


rpllO 


rpll 2 


rpoB 


rpoC// 














rpslO. 


Rick. 


prow. 


rplll 


rpll 


rpllO 


rpll 2 


rpoB 


rpoC// 












tufA 


rpslO. 


Orie. 


tsut. 


rplll 


rpll 


rpllO 


rpll 2 


rpoB 


rpoC// 












tufA 


rpslO. 


Wolb- 


-Droso . 


rplll 


rpll 


rpllO 


rpll 2 


rpoB 


rpoC//rpsl2 


rps 7 - 










I tufA\ 


rpslO. 


Azos . 


spec . 


rplll 


rpll 


rpllO 


rpll 2 


rpoB 


rpoC 


rpsl2 


rps 7 - 










tufA 


rpslO. 


Magn . 


magn . 


rplll 


rpll 


rpllO 


rpll 2 


rpoB 


rpoC 


rpsl2 


rps 7 ■ 










tufA 


rpslO. 


Rhod. 


cent . 


rplll 


rpll 


rpllO 


rpll 2 


rpoB 


rpoC 


rpsl2 


rps 7 ■ 










tufA 


rpslO. 


Rhod. 


rubr . 


rplll 


rpll 


rpllO 


rpll 2 


rpoB 


rpoC 


rpsl2 


rps 7 ■ 










tufA 


rpslO. 


Tist. 


mobi . 


rplll 


rpll 


rpllO 


rpll 2 


rpoB 


rpoC 


rpsl2 


rps 7 ■ 










tufA 


rpslO. 


Reel . 


amer . 


rplll 


rpll 


rpllO 




rpoB 


rpoC 


rpsl2 


rps 7 


nadll 


nadl 


cox 11 


cox3 


tufA 


rpslO 


Anda . 


godo . 


rplll 


rpll 


rpllO 




rpoB 


rpoC 


rpsl2 


rps 7 


nadll 


nadl 


coxll 


cox3 


tufA 


rpslO 


Hist. 


aroi . 


rplll 


rpll 


rpllO 




rpoB 


rpoC 


rpsl2 


rps 7 


nadll 


nadl 


cox 11 


cox3 


tufA 


rpslO 


Jako . 


baha . 


rplll 


rpll 


rpllO 




rpoB 


rpoC 


rpsl2 


rps 7 


nadll 


nadl 


coxll 


cox3 


tufA 


rpslO 


Jako . 


libe. 


rplll 


rpll 


rpllO 




rpoB 


rpoC 


rpsl2 


rps 7 


nadll 


nadl 


coxll 


cox3 


tufA 


rpslO 


Secu . 


ecua . 


rplll 




rpllO 




rpoB 


rpoC 


rpsl2 


rps7// 






cox3 


tufA 


rpslO 



S10 Spc - 

Midi. spec. rpl3 rpl4 rpl2 rpsl9 rps3 rpll 6 rpll 4 rpl5 rpsl4 rps6 

Rick. prow. rpl3 rpl4 rpl2 rpsl9 rps3 rpll 6 rpsl7 rpll 4 rps24 rpl5 rpsl4 rps6 

Orie. tsut. rpl3 rpl4 rpl2 rpsl9 rps3 rpll 6 rpsl7 rpll 4 rps24 rpl5 rpsl4 rpsb 

Wolb-Droso. rpl3 rpl4 rpl2 rpsl9 rps3 rpll 6 rpll 4 rpl5 rpsl4 rps6 

Azos. spec. rpl3 rpl4 rpl23 rpl2 rpsl9 rps3 rpll 6 rpl29 rpsl7 rpll 4 rpl24 rpl5 rpsl4 rps8 

Magn. magn. rpl3 rpl4 rpl23 rpl2 rpsl9 rps3 rpll 6 rpl29 rpsl7 rpll 4 rpl24 rpl5 rpsl4 rpsb 

Rhod. cent. rpl3 rpl4 rpl23 rpl2 rpsl9 rps3 rpll 6 rpl29 rpsl7 rpll 4 rpl24 rpl5 rpsl4 rps8. 

Rhod. rubr. rpl3 rpl4 rpl23 rpl2 rpsl9 rps3 rpll 6 rpl29 rpsl7 rpll 4 rpl24 rpl5 rpsl4 rps8. 

Tist. mobi. rpl3 rpl4 rpl23 rpl2 rpsl9 rps3 rpll 6 rpsl7 rpll 4 rps24 rpl5 rpsl4 rpsS 

Reel. amer. rpl2 rpsl9 rps3 rpll 6 rpll4 rpl5 rpsl4 rpsS 

Anda. godo. rpl2 rpsl9 rps3 rpll 6 rpll4 rpl5 rpsl4 rps8. 

Hist. aroi. rpl2 rpsl9 rps3 rpll 6 rpll4 rpl5 rpsl4 rps8. 

Jako. baha. rpl2 rpsl9 rps3 rpll 6 rpll4 rpl5 rpsl4 rps8. 

Jako. libe. rpl2 rpsl9 rps3 rpll 6 rpll4 rpl5 rpsl4 rps8. 

Secu. ecua. rpl2 rpsl9 rps3 rpll 6 rpll4 rpl5 rpsl4 rps8. 



Alpha 



Midi . 


spec . 


r P 16 


rpll 8 


rps 5 


secY 


rpsl3 


rps 11 


rpoA// 


Rick. 


prow. 


rpl6 


rpll 8 


rps 5 


secY 


rpsl3 


rps 11 


rpoA// 


Orie. 


tsut. 


rpl6 


rpll 8 


rps 5 


secY 


rpsl3 


rps 11 


rpoA// 


Wolb- 


-Droso . 


rpl6 


rpll 8 


rps 5 


secY 


rpsl3 


rps 11 


rpoA// 


Azos . 


spec . 


rpl6 


rpll 8 


rps 5 


secY 7 '/ 'rps 13 


rps 11 


rpoA// 


Magn . 


magn . 


rpl6 


rpll 8 


rps 5 


secY 


rpsl3 


rps 11 


rpoA// 


Rhod. 


cent . 


rpl6 


rpll 8 


rps 5 


secY 


rpsl3 


rps 11 


rpoA// 


Rhod. 


rubr . 


rpl6 


rpll 8 


rps 5 


secY 


rpsl3 


rps 11 


rpoA// 


Tist. 


mobi . 


rpl6 


rpll 8 


rps 5 


secY 


rpsl3 


rps 11 


rpoA// 


Reel . 


amer . 


rpl6 


rpll 8 




secY 


rpsl3 


rps 11 


rpoA rpsl 


Anda . 


godo . 


rpl6 


rpll 8 






rps 13 


rps 11 


rpoA// 


Hist. 


aroi . 


rpl 6 


rpll 8 




secY 


rps 13 


rps 11 


rpoA rpsl 


Jako . 


baha . 


rpl 6 


rpll 8 




secY 


rps 13 


rps 11 


rpoA rpsl 


Jako . 


libe. 


rpl 6 


rpll 8 




secY 


rps 13 


rps 11// 


Secu . 


ecua . 


rpl 6 


rpll 8 




secY 


rps 13 


rps 11 


rpoA rpsl 



Fig. 2. — Gene order comparison of jakobid mtDNA and a-proteobacteria. Conservation of ribosomal gene organization in the mitochondrial genome of 
jakobids, compared with the generally contiguous L1 1, L10, Beta, streptomycin (Str), S10, spectinomycin (Spc), and Alpha operons of a-proteobacteria. 
Species abbreviations and GenBank Accession nos. are: Azos. spec, Azospirillum sp., NC_013854; Magn. magn., Magnetospirillum magneticum, 
NC_007626; Midi, mito., Candidatus Midichloria mitochondrii, NC_015722; Orie. tsut, Orientia tsutsugamushi str. Ideda, NC_010793; Rhod. cent, 
Rhodospirillum centenum, NC_011420; Rhod. rubr., Rhodospirillum rubrum ATCC 11170, NC_007643; Rick, prow., Rickettsia prowazekii str. Madrid, 
NC_000963; Tist. mobi., Tistrella mobilis KA081 020-065, NC_017966; and Wolb-Droso., Wolbachia endosymbiont of Drosophila melanogaster, 
NC_002978. 



Fig. 1. — Continued 

by the suffixes -1, -2, etc. Genes on the outer circle are transcribed in a clockwise direction, and those on the inner circle are transcribed 
counterclockwise. Transfer RNAs are indicated by the one-letter code of their cognate amino acid. Clockwise map order of tRNA genes is 
represented by increasing distance of the gene label from the circle's center. The group II intron in trnWls shown as a yellow box. The innermost 
circle serves as size marker. Arcs indicate the largest synteny block (fig. 2). Unfilled arrows indicate the positions of predicted promoters. Black, 
filled arrowheads and the encircled "T" specify the predicted positions of replication origins and terminators, respectively. (A) Andalucia 
godoyi, (B) Histiona amides, (Q Jakoba bahamiensis, and (D) J. libera. The linear J. libera mitochondrial genome is presented as an open 
circle, with red dots symbolizing the two extremities of the mtDNA molecule. Red arrows denote inverted repeats. The-/, libera map shows only 
ORFs > 98 residues. For the full listing of ORFs, see table 5. (£) Reclinomonas americana-94 (GenBank Accession no. NC_001823). The map of the 
other three Redinomonas strains is identical except for the size of equipositional ORFs (see list in boxes). (F) Seculamonas ecuadoriensis. 
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organelle (Gray et al. 2004; Lang et al. 2011). Among the 
jakobid mitochondrial genomes analyzed here, only that of 
Andalucia encodes a complete set of tRNAs capable of read- 
ing all codons, assuming a mechanism whereby U in the first 
position of the anticodon permits recognition of all codons in a 
four-codon family (e.g., tRNA Ala with anticodon UGC). The 
29 different tRNAs specified by Andalucia mtDNA include sep- 
arate initiator and elongator tRNA Met isoacceptors, as well as 
an apparent tRNA lle having a CAU anticodon. In this case, as in 
bacteria, the first position (C) of the anticodon presumably 
undergoes modification to lysidine (L), thereby enabling the 
LAU anticodon to read the AUA codon as isoleucine. 

Of the 29 tRNAs encoded by Andalucia mtDNA, 25 are 
shared with all the other jakobids (an additional mitochondrial 
tRNA Leu with GAG anticodon is present in all nine species 
except J. libera). Two further tRNAs, tRNA Ser and tRNA Val 
with GGA and UAC anticodons, respectively, are selectively 
shared between Andalucia and Seculamonas. Only Andalucia 
contains a native (mtDNA-encoded) tRNA Thr , whereas only 
Histiona and J. bahamiensis mtDNAs encode a tRNA Leu with 
CAA anticodon. 

All jakobid mitochondrial tRNA sequences are able to 
assume the canonical cloverleaf secondary structure of a con- 
ventional tRNA. Occasional deviations from the typical struc- 
ture are mostly supported by their occurrence in more than 
one jakobid. Examples include A rather than U at position 8 in 
tRNA Ala (ugc) in all four Reclinomonas strains; C rather than A 
or G at position 9 in tRNA Glu (uuc) in all the jakobids; and a 
purine-purine mismatch in the first position of the anticodon 
stem of tRNA His (gug) and a UxU mismatch in tRNA lle (cau) at 
the fourth anticodon stem position (both of the latter cases in 
Histiona and the four Reclinomonas strains). A particularly no- 
table feature involves the first position of the anticodon loop 
of the elongator tRNA Met , which almost universally in tRNA is 
a pyrimidine. Atypically, this position is A in the elongator 
tRNA Met of Andalucia, J. libera, and the four Reclinomonas 
strains (but G in Seculamonas). A is also found in this position 
in the mitochondrial elongator tRNA Met of many (although 
not all) other protists. 

In two Seculamonas mitochondrial tRNAs, tRNA Glu (uuc) 
and tRNA Ser (gga), mismatches in the first three acceptor 
stem positions suggested the possibility of tRNA editing, an 
inference verified experimentally (Leigh and Lang 2004). In 
these two instances, mismatches are converted to standard 
base pairs via removal and replacement of nucleotides at the 
3'-end of the transcript, rather than at the 5'-end, as in a 
number of other tRNA editing systems (Lonergan and Gray 
1993a, 1993b; Laforest et al. 1997). Aside from these two 
tRNAs, there is no compelling evidence that other jakobid mi- 
tochondrial tRNAs undergo 5'- or 3 / -editing. The homologous 
Andalucia tRNA Ser (gga), for example, has a fully base-paired 
acceptor stem encoded in the mtDNA, and in other cases, 
nonstandard base pairs in the acceptor stem are overwhel- 
mingly G-U or U-G, which in Seculamonas have been shown 



not to be edited (nor is an acceptor stem UxU mismatch in 
the mitochondrial tRNA Hls of this organism) (Leigh and Lang 
2004). With respect to the possibility of tRNA editing, the only 
case that perhaps warrants further investigation is the elonga- 
tor tRNA Met of 7. bahamiensis, in which the first three acceptor 
stem positions are G-T, A x C, and T-G. 

Finally, we note that the position immediately upstream of 
the beginning of the mitochondrial trnH(gug) gene — the -1 
position — is G in all the jakobids. Because the G_^ position is 
an almost universal feature of tRNA Hls , constituting a required 
identity element for histidylation, the jakobid mitochondrial 
tRNA Hls potentially acquires G_-| via an abnormal RNase P 
cleavage during pre-tRNA processing, as occurs in bacteria 
(Jackman etal. 2012). If so, such a pathway would presumably 
obviate a requirement for a mitochondrial tRNA Hls guanylyl- 
transferase, the enzyme that adds G_-| to the cytoplasmic 
tRNA His in eukaryotes. 

Examination of codon usage indicates that all jakobids 
employ the standard genetic code for mitochondrial transla- 
tion (i.e., TGA does not specify Trp as in mitochondria of many 
other eukaryotes). Unlike in non-jakobid mitochondria that 
employ the standard genetic code, TGA stop codons occur 
relatively frequently in Andalucia (9 of 65 assigned genes), 
followed by 4/61 in J. libera; the other jakobids use this ter- 
mination codon only once or twice, or not at all. TAG termi- 
nation codons are also used relatively frequently in jakobid 
mitochondria, the ratio TAG:TAA ranging from 0.4 in J. baha- 
miensis to 0.1 in Reclinomonas-83. Mitochondrial TAG stop 
codons are reasonably abundant in land plants (Arabidopsis 
thaliana with a ratio of 0.2) and some fungi (e.g., in the 
Gigaspora rosea and Glomus irregulare [Nadimi et al. 
2012]), whereas absent in the large majority of the other 
eukaryotes. In bacteria, TAA and TAG codons are served by 
the peptide release factor RF1, whereas a second factor, RF2, 
recognizes TAA and TGA (Scolnick et al. 1968). In mitochon- 
dria, TGA seems to co-occur with a nucleus-encoded RF2-like 
factor (Duarte et al. 201 2), which we expect to be present also 
in Andalucia and J. libera mitochondria. 

RNase P-RNA 

P-RNA is the catalytic subunit of a ribonucleoprotein particle 
(RNP) that processes tRNA 5 r -ends via endonucleolytic cleav- 
age of precursor transcripts (Peck-Miller and Altman 1991). 
Both the cytoplasm and mitochondria have their own RNase P 
complex. The structural RNA associated with the mitochon- 
drial RNP is rarely encoded by mtDNA (only present in a few 
fungi and protists); when it is, its secondary structure is often 
highly derived, rendering identification difficult (e.g., Martin 
and Lang 1997; Seif et al. 2003, 2005). 

We discovered RNase P-RNA (rnpB) genes in all nine jakobid 
mtDNAs (Erpin search results are listed in supplementary table 
S3, Supplementary Material online), and the inferred RNA 
secondary structure of three representatives is depicted in 
figure 3. The Andalucia two-dimensional (2D) structure 
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Fig. 3. — Comparison of Andalucia and Reclinomonas mtP-RNA secondary structures. The Andalucia structure stands out as the shortest and most 
derived among jakobids, lacking P3, P12, and P19 pairings that are otherwise present in all jakobids (additionally, P19 is a hallmark feature of 
a-proteobacterial P-RNAs). The predicted 3'-end of the Andalucia mtP-RNA is adjacent to the downstream tRNA Gly(ucc) gene, that is, 5'-tRNA processing 
by RNase P is responsible for 3'-maturation of its own RNA subunit. In most other jakobid mtDNAs, the tRNA Gly(ucc) gene is located directly upstream of 
the rnpB gene and transcribed from the opposite strand. 



stands out as the smallest and most derived among jakobids, 
lacking P3, P1 2, and P1 9 pairings that are otherwise present in 
all jakobids (note that P19 is a hallmark of a-proteobacterial 
P-RNAs). In contrast to their bacterial counterparts (Stark et al. 
1978; Altman 1989) and despite their bacteria-like RNA 
2D-structure, P-RNAs of the four jakobids investigated bio- 
chemically (Redinomonas-94, J. bahamiensis, J. libera, and 
5. ecuadoriensis) are unable to catalyze pre-tRNA cleavage 
in the absence of protein factors (Seif et al. 2006). 
Therefore, the structurally reduced RNA molecule of 
Andalucia mitochondria is most likely also inactive by itself. 
Interestingly, the predicted 3'-end of the Andalucia rnpB di- 
rectly abuts the downstream trnG(ucc) gene. Thus, 5 r -tRNA 
processing by RNase P would simultaneously generate the 
mature 3'-terminus of its own P-RNA subunit. 



Transfer-Messenger RNA 

Bacteria and some plastids use tmRNAs to recognize and 
liberate translation complexes that have stalled on mRNAs 
lacking a stop codon. tmRNAs also earmark incomplete pro- 
teins for proteolysis by appending a short, conserved peptide 
(for a review, see Karzai et al. 2000). In the first reaction step, 
the tRNA Ala -like domain of this RNA molecule triggers addition 
of a nonencoded Ala at the end of incomplete peptide chains. 
This step is followed by translation of the short mRNA-like 
region, which adds a signal peptide to the incomplete protein, 
marking the latter for degradation by C-terminal-specific 
proteases (Williams et al. 1999; Keiler et al. 2000; Zvereva 
et al. 2001). In oc-proteobacteria and certain cyanobacteria, 
tmRNAs consist of two separate pieces that are held together 
by RNA-RNA interactions. In the corresponding gene 
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Fig. 4. — Comparison of tmRNA secondary structures from Andalucia, Redinomonas-94, and Jakoba libera. Nucleotides marked blue are conserved in 
bacterial and jakobid tmRNAs. Nucleotides in red are identical among jakobid homologs. Yellow highlighting indicates the typical tRNA Ala identity element 
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3 ; -termini, and the site of post-transcriptional CCA addition is indicated. The Andalucia tmRNA is characterized by a short, compact structure and a 
shortened T-loop with an unusual sequence compared with the other counterparts. 



(designated ssrA), the tRNA- and mRNA-like domains are 
circularly permutated with respect to the gene product 
(Keiler et al. 2000); consequently, the gene remained unrec- 
ognized for many years. 

With the exception of J. bahamiensis mtDNA, jakobid 
mitochondrial genomes encode tmRNA but of a reduced 
form that lacks the mRNA-like domain (Jacob et al. 2004), 
which likely restricts tmRNA function to liberating stalled ribo- 
somes. Similar to their oc-proteobacterial ancestors, all 
mtDNA-encoded tmRNAs except that of J. libera have the 
two-piece configuration (Jacob et al. 2004) and contain the 
known tRNA Ala identity elements, notably a G-U base pair at 
the third position of the acceptor stem and an A as the dis- 
criminator nucleotide preceding the CCA tail (Komine et al. 
1994). All these features are also present in the inferred 
Andalucia tmRNA (fig. 4); however, compared with 



mitochondrial tmRNAs of the other jakobids, Andalucia's is 
notable for its condensed secondary structure and shortened 
T-loop with an uncommon sequence. Similarly divergent is 
ssrA of J. libera mtDNA, which exists in a contiguous, non- 
permutated form. Finally, no ssrA was detected in the mito- 
chondrial genome of 7. bahamiensis, despite a highly sensitive 
covariance search that even recognizes the structurally distinct 
J. libera sequence (supplementary table S4, Supplementary 
Material online). 

Replication of mtDNA 

Circular bacterial chromosomes typically replicate by the theta 
mode starting at a single bidirectional origin and ending at the 
replication terminus, roughly opposite of the origin. The lead- 
ing strand generally has an excess of guanines (G) relative to 
cytosines (C), a bias most likely due to different regimes of 
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mutations or DNA repair in the leading versus the lagging 
strand during theta replication (Lobry 1996). Analysis of the 
GC skew ((G - C/G + Q) along one strand of a given bacterial 
chromosome provides a good indication of the position of the 
origin and terminus of replication (Grigoriev 1998; Salzberg 
et al. 1998; for illustration, see supplementary fig. S44, 
Supplementary Material online). However, to a minor 
degree, strand-specific GC skew may also arise as a result of 
codon bias in protein-coding regions (Tillier and Collins 2000 
and references therein). 

We generated cumulative GC skew plots of jakobid 
mtDNAs (supplementary fig. S4B-F, Supplementary Material 
online; for predicted sites see fig. 1). Only the mtDNA of 
R. americana strains exhibits a prominent bimodal GC skew 
curve, characteristic of the classical bidirectional theta mode. 
The relatively close spacing of the curves' minimum and 
maximum suggests that the replication origin and termination 
sites are not situated opposite to one another on the circular- 
mapping chromosome but rather close to each other, so that 
replication proceeds asymmetrically, with the clockwise repli- 
cation covering as much as approximately 80% of the 
genome. mtDNA in Histiona exhibits a similar, but much less 
pronounced, curve as that of Redinomonas. In contrast, the 
GC skew of 7. bahamiensis mtDNA is strikingly homogenous 
along the sequence, resembling the situation in the yeast 
Candida glabrata (supplementary fig. S4G, Supplementary 
Material online), where experimental evidence suggests rolling 
circle replication of mtDNA. The GC skew graphs of the 
remaining three circular jakobid mtDNAs do not allow infer- 
ences about the replication mechanism. 

The linear 7. libera mtDNA displays a GC skew curve 
with a major central maximum (supplementary fig. S4E, 
Supplementary Material online), suggesting unidirectional rep- 
lication that starts at both ends and terminates in the middle 
of the genome. This situation is reminiscent of linear-plasmid 
(invertron) replication in Neurospora (Chan et al. 1991) and is 
consistent with a plasmid insertion event as postulated above, 
transforming the mitochondrial genome architecture from 
circular to linear. Genome linearization mediated by plasmid 
insertion has been previously demonstrated in maize mito- 
chondria, where the distal sequences of the linearized 
mtDNA originate from a plasmid (Schardl et al. 1984). As in 
autonomously replicating linear mitochondrial plasmids, 
mtDNA termini in maize carry a covalently attached terminal 
protein, which is thought to prime DNA synthesis during rep- 
lication (Sakaguchi 1990). 

The linear mitochondrial genome of Candida subhashii also 
appears to replicate similar to invertrons (Fricova et al. 2010) 
(supplementary fig. S4/, Supplementary Material online). It was 
postulated that replication of this mtDNA relies on the 
mitochondrion-encoded plasmid-derived B-family DNA poly- 
merase (dpo), but biochemical or genetic evidence for this ac- 
tivity is lacking. The complete and seemingly functional dpo 
gene in C. subhashii mtDNA may reflect a gene acquisition that 



took place recently, leaving insufficient time to accumulate 
mutations. For 7. libera, it is even less likely that the mitochon- 
drial dpo gene plays a role in replication, because the deduced 
Dpo protein lacks a 21 -residue-long stretch in the C-terminal 
region that is otherwise conserved in C. subhashii mitochon- 
drial Dpo and other members of B-family DNA polymerases. 
Note that dpo genes also reside in select circular-mapping 
mtDNAs; to our knowledge, however, in all cases, these 
sequences undergo rapid mutational decay (e.g., Burger 
etal. 1999; Barroso etal. 2001; Nadimi etal. 2012). 

Potential Promoters and SD Motifs 

Because most jakobid mitochondrial genomes encode a 
multisubunit RNA polymerase, we attempted to predict 
bacteria-like promoters in two of the minimally derived jako- 
bid mtDNAs, those of Andalucia and Redinomonas-94. 
Combining predicted DNA-duplex destabilized regions (poten- 
tial promoter motifs) and using a promoter score threshold 
inferred from shuffled sequence (see Materials and Methods), 
we detect four and seven promoter candidates in Andaluda 
and Redinomonas-94 mtDNA, respectively (supplementary 
table S5, Supplementary Material online; arrows in fig. 1). 
Obviously, experimental transcript data will be needed to con- 
firm the functionality of bacteria-like promoters proposed 
here in jakobid mtDNAs. 

In bacteria, SD-like sequence motifs assure selection of the 
proper translation initiation codon by pairing with the 3'-end 
of SSU rRNA. Up to this point, the only organism with con- 
vincing putative mitochondrial SD-like sequence motifs had 
been Redinomonas-94 (Lang et al. 1997). Here we show 
that, with the exception of 7. libera, such motifs are also pre- 
sent in the other jakobid mtDNAs (supplementary table S6, 
Supplementary Material online). These motifs are complemen- 
tary to a pyrimidine-rich sequence stretch at the inferred 3 f - 
end of mitochondrial SSU rRNA (5'-CUCCUUU 0 h, compared 
with S'-CUCCUUAoh in E.oli 16S rRNA) and located 7-1 5 nt 
upstream of the initiator ATG of protein-coding genes. The 
largest number of 6-nt long SD-like motifs (5'-AAAGGA-3') is 
present in Andaluda mtDNA, upstream of 28 of 64 assigned 
protein-coding genes. In this genome, another 22 genes 
are preceded by a 5-residue-long motif (5 r -AAAGG-30 that 
is probably also functional. In the other jakobids, the 
number of 6- and 5-nt-long SD-like motifs decreases from 
31 to 15 in the order Redinomonas-94 > Redinomonas- 
84 > Redinomonas-83 > Seculamonas > Histiona > J. baha- 
miensis. In 7. libera mtDNA, where SD-like motifs are absent, 
genes are preceded by stretches of A- or T-rich sequence. 

Other Regulatory Sequence Elements 

Other potential regulatory ds-elements in jakobid mtDNAs 
include conspicuous palindromic sequences in intergenic 
regions that have a propensity to form hairpin secondary 
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structures. Although mitochondrial genomes of most jakobids 
exhibit up to 10 hairpin elements with a stem of >10bp, 
that of Seculamonas is particularly palindrome-rich (>40 ele- 
ments). Interestingly, six jakobid mtDNAs share a hairpin ele- 
ment located immediately downstream of rnl. Such hairpins 
may play a number of biological roles including the control 
of RNA processing and translation. For example, hairpin 
sequence elements in 3'-untranslated regions of mitochon- 
drial mRNAs have been suggested to be recognition sites for 
RNases (Schuster et al. 1986) and thus may mediate process- 
ing of polycistronic precursor transcripts as well as end pro- 
cessing of single-gene transcripts. Alternatively, stem-loop 
structures at the 3 / -terminus of mRNAs have been found to 
stabilize transcripts in both bacteria and chloroplasts (Stern 
and Gruissem 1987; Manley and Proudfoot 1994; Rochaix 
1996; Leigh and Lang 2004). Finally, hairpin sequence ele- 
ments seen in jakobid mtDNAs could play a part in transcrip- 
tion termination. In bacteria, for instance, one of the 
mechanisms is p-independent termination, which involves 
the formation of a stem-loop structure in the transcript up- 
stream of a U-rich sequence stretch (Richardson 2002). 

Phylogeny and Evolutionary Inferences Based on 
Mitochondrial Proteins 

We conducted a phylogenetic analysis employing PhyloBayes 
and the CAT model, and using concatenated mtDNA- 
encoded protein sequences from jakobids (with 
Redinomonas-94 representing all strains of this genus), repre- 
sentatives of other major eukaryotic groups, and 
oc-proteobacteria as an outgroup (a total of 59 species and 
5,805 aligned amino acid positions). The resulting tree dem- 
onstrates monophyly of jakobids and unambiguous position- 
ing of A. godoyi as the deepest divergence within this group 
(fig. 5). Most phylogenetic relationships in the tree are consis- 
tent with inferences based on nuclear gene data; most incon- 
sistencies are not well supported in the mitochondrial 
phylogeny and are probably artifactual due to either the rela- 
tively small data matrix and/or known phylogenetic artifacts 
such as long-branch attraction (LBA; Felsenstein 1978). As in 
other analyses published previously (e.g., Andersson et al. 
1 998; Derelle and Lang 201 1 ), the tree shown here specifically 
groups mitochondria with Rickettsiales. However, this alliance 
may result from LBA, further exacerbated by A + T sequence 
bias (Foster and Hickey 1999), as suggested by the fast evo- 
lutionary rates and the short common branch uniting these 
two lineages (fig. 5). 

Discussion 

Differential Gene Migration as a Factor Driving mtDNA 
Diversity 

Evolutionary gene loss from mtDNA is mostly a consequence 
of gene migration to the nucleus. For example, atpl, atp3, 



and atp4, present in mtDNA of jakobids and a few other 
eukaryotes, reside in the nucleus of animals and fungi. Gene 
loss can also be due to functional substitution by a nuclear 
gene. This scenario likely applies to tRNA Thr , because an 
mtDNA-specified gene is missing in all jakobids except 
Andalucia (table 4). Import of tRNAs into mitochondria has 
been demonstrated experimentally in several eukaryotes 
(reviewed in Alfonzo and Soil 2009). 

A more complicated case is the mitochondrial RNA poly- 
merase. In eight of the nine jakobids studied here, mtDNA 
carries all four genes (rpoA-D) specifying a typical multisubunit 
bacteria-like RNA polymerase, and our in silico analyses predict 
the presence of bacteria-like promoter motifs. Jakoba libera is 
the only jakobid having an incomplete set of mitochondrion- 
encoded bacteria-like RNA polymerase genes, with rpoA and 
rpoD genes apparently missing. Again, the latter two genes 
may have emigrated to the nucleus, with the result that the 
J. libera mitochondrial a 2 pp r a RNA polymerase is assembled 
from mitochondrion- and cytosol-synthesized proteins. 
Alternatively, the remaining rpo genes in J. libera mtDNA 
may be vestigial, and mitochondrial transcription may be per- 
formed either by only these two core polymerase subunits or, 
as in the large majority of eukaryotes, by a T3/T7-phage-like 
RNA polymerase (Cermakian et al. 1996, 1997). Finally, it is 
also conceivable that in J. libera mitochondria, bacteria-like 
and phage-like transcription machineries operate simulta- 
neously and mediate expression of distinct subsets of genes, 
as seen in chloroplasts of certain plants (Gray and Lang 1 998). 
To gain insight into the biological role of the mitochondrion- 
encoded rpoB and rpoC in J. libera, sequence information 
from the nuclear genome will be required as well as biochem- 
ical experiments. 

What Is the Origin of cox/5? 

The mtDNA-encoded cox15 gene in Andalucia is most intrigu- 
ing, as its protein sequence closely resembles Cox15 of the 
free-living oc-proteobacterium Tistrella, both in terms of se- 
quence similarity and indel signatures (supplementary fig. 
S1, Supplementary Material online). The latter are not 
shared with either members of Rickettsiales, or other 
oc-proteobacteria, or most surprisingly, with the nucleus- 
encoded versions throughout eukaryotes. It is conceivable 
that the mitochondrial endosymbiont was closely related to 
Tistrella and that cox15 was introduced into the nuclear 
genome via mitochondrion-to-nucleus gene migration fol- 
lowed by a rapid change of indel signatures in the nuclear 
gene. Other more complex scenarios are also plausible, such 
as horizontal gene transfer from bacteria — transiently associ- 
ated with eukaryotes — to either the mitochondrion or the 
nucleus. However, these scenarios are not testable with the 
currently available data, because the Cox15 protein is rela- 
tively short and insufficiently conserved. Meaningful phyloge- 
netic analyses would require genome data from Tistrella 
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Fig. 5. — Phylogeny based on mtDNA-encoded proteins. The tree was constructed based on concatenated mtDNA-encoded protein sequences from 
jakobids, slowly evolving representatives of other, major eukaryotic groups, 16 a-proteobacteria (a total of 52 species and 5,791 aligned amino acid 
positions), and Magnetococcus as a close, non a-proteobacterial outgroup species. Reclinomas-94 represents all four, very similar Reclinomonas species 
studied here. The tree shown was inferred with PhyloBayes, the CAT + Gamma model, and six discrete categories (Lartillot and Philippe 2004), based on 
19 mtDNA-encoded, concatenated proteins. Numbers indicate jackknife support values; branches without numbers have 100% support values. Short, basal 
branches with less than 60% support were collapsed. Posterior probability values (PhyloBayes analysis with four independent chains) other than 1.0 are 
indicated in brackets, following the corresponding jackknife value. Maximum likelihood inference predicts a similar tree (not shown), except for a strong 
(100% bootstrap value) regrouping of the fast-evolving Amoebozoa with haptophytes, which is probably an LBA artifact. Full taxon names and corre- 
sponding GenBank accession numbers for bacterial sequences: Agrobacterium radiobacter, NC_011985, NC_011983; Alphaproteobacterium BALI 99, 
NZ_ABHC01 000000; Azospirillum brasilense, NC_016617; Bradyrhizobium japonicum, NC_004463; Ehrlichia canis, NC_007354; Magnetococcus sp., 
NC_008576; Micavibrio aeruginosavorus, NC_016026; Magnetospirillum magnetotacticum, NZ_AAAP00000000; Mezorhizobium loti, NC_002682, 
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434 Genome Biol. Evol. 5(2):418^38. doi:10.1093/gbe/evt008 Advance Access publication January 18, 2013 



Jakobid Mitochondrial Genomes 



GBE 



relatives, plus sequences from nucleus-encoded cox15 ver- 
sions in jakobids and other protist lineages. Note that 
Tistrella is listed under Rhodospirillaceae in the NCBI taxon- 
omy, but according to our analysis shown here, it diverges 
basally to oc-proteobacteria and without apparent affinity to 
any of its subgroups. 

Evolution of Jakobids from a Mitochondrial Perspective 

Although the depicted mitochondrial phylogeny is well 
resolved and comprehensive in terms of spanning the entire 
range of known jakobid diversity, the data set of mtDNA- 
encoded protein sequences is still too small to resolve the re- 
lationship of jakobids to other eukaryotes and in particular to 
the jakobid-like (in an ultrastructural sense; O'Kelly and Nerad 
1999) malawimonads. However, even the large nuclear data 
sets are unable to reproducibly position jakobids relative 
to malawimonads, and analyses with different data sets do 
not concur (Rodnguez-Ezpeleta et al. 2007; Hampl et al. 2009; 
Derelle and Lang 2011). Still, the tree based on mtDNA- 
encoded proteins provides convincing support for jakobid 
monophyly; it is also the first to place Andalucia godoyi as 
the deepest divergence within jakobids, to cluster Histiona 
as a sister taxon of Redinomonas, and to unite the two 
Jakoba species (fig. 5). 

As a result, key events in mitochondrial genome evolution 
in jakobids can be "dated" by mapping their occurrence onto 
the tree (fig. 6). For example, the loss of the tRNA Thr gene and 
the group II intron insertion in the tRNA Trp gene likely took 
place very early in jakobid evolution, and the tRNA intron was 
probably lost secondarily in the lineage leading to J. libera. 
Editing of mitochondrial tRNAs at their 3 r -ends, which 
occurs exclusively in 5. ecuadoriensis, probably evolved re- 
cently in this particular branch. The most dramatic evolution- 
ary events occurred in the J. libera lineage, notably loss of 
mtDNA-encoded rpoA and rpoD genes (table 3), acquisition 
of long intergenic regions and numerous ORFs, divergent RNA 
secondary structures (e.g., fig. 4), loss of SD-like motifs, rever- 
sion of tmRNA to a nonpermutated continuous molecule, and 



-trnTy -rpl35, -coxl5, 
+trn W intron y 



+3' tRNA-editing 
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-rpoA, -rpoD, -rpsl, -rpll8, -rpll9, 
-trnW'mtr<m, -SD motifs; reversal of 
tmRNA permutation; genome linearization 
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Jakoba libera 
Jakoba bahamiensis 

> Redinomonas 

• Histiona 
■ Andalucia 
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Fig. 6. — Evolutionary changes in jakobid mitochondrial genomes. 
The events are mapped onto the phylogenetic tree of jakobids assuming 
the smallest number of steps. -, loss; +, gain. For alternative evolutionary 
histories of cox/5, see Discussion. The tree topology is taken from figure 5. 
For full species names, see the legend of figure 5. 



genome linearization. This evolutionary acceleration may have 
been triggered by a plasmid insertion into 7. libera mtDNA, as 
discussed earlier. 



Conclusions and Outlook 

Jakobids stand out from other eukaryotes by reason of the 
elevated gene complement of their mitochondrial genomes, 
which specify molecular functions not encoded by any other 
mtDNA. Jakobids are also exceptional as to the bacteria-like 
features of mtDNA-encoded structural RNAs and bacteria-like 
regulatory elements presumably used in mitochondrial gene 
expression. Although Redinomonas-94 was initially character- 
ized as a unique organism whose mtDNA is extraordinarily 
ancestral, we show here that this protist belongs to a sizeable 
eukaryotic group that had remained unrecognized for 
decades. 

The analysis of jakobid mitochondrial genomes reported 
here raises numerous new research questions. For example, 



Fig. 5. — Continued 

NC_002679, NC_002678; Midichloria mitochondrii, NC_015722; Neorickettsia sennetsu, NC_007798; Orientia tsutsugamushi str. Ideda, 
NC_010793; Novosphingobium aromatidvorans, NC_007794; Rhodospirillum centenum, NC_011420; Rhodospirillum rubrum ATCC 11170, 
NC_007643; Rickettsia prowazekii str. Madrid, NC_000963; Tistrella mobilis, NC_017966; and Wolbachia endosymbiont of Drosophila melano- 
gaster, NC_002978. Full taxon names and corresponding GenBank accession numbers for mitochondrial sequences: Acanthamoeba castellanii, 
NC_001637; Andaluda godoyi, this report, Bigelowiella natans, HQ840955; Capsaspora owczarzaki, (unpublished, available upon request); 
Chaetosphaeridium globosum, NC_004118; Chondrus crispus, NC_001677; Chattonella marina, NC_013837; Cyanidioschyzon merolae, 
NC_000887; Cyanophora paradoxa, NC_017836; Desmarestia viridis, NC_007684; Emiliania huxleyi, NC_005332; Geodia neptuni, NC_006990; 
Glaucocystis nostochinearum, NC015117; Hartmannella vermiformis, NC_0 13986; Hemiselmis andersenii, NC_0 10637; Histiona amides, this 
report; Jakoba bahamiensis, this report; J. libera, this report; Malawimonas californiana (unpublished, available upon request); Marchantia 
polymorpha, NC_001660; Mesostigma viride, NC_008240; Nephroselmis olivacea, NC_008239; Malawimonas jakobiformis, NC_002553; 
Monosiga brevicollis, NC_004309; Pavlova lutheri, HQ908424; Phytophthora infestans, NC_002387; Porphyra purpurea, NC_002007; 
Prototheca wickerhamii, NC_001613; Redinomonas americana, NC_001823; Rhodomonas salina, NC_002572; Seculamonas ecuadoriensis, 
this report; and Thalassiosira pseudonana, NC_007405. 
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it would be interesting to validate predictions of translation 
initiation, operons, transcription terminators, and replication 
origins. In addition, a biochemical characterization of jakobid 
mitochondrial RNA polymerases in J. libera would be in order. 
However, such studies will not be trivial, as jakobids are diffi- 
cult to culture, and no protocols are yet available for isolating 
pure and intact mitochondria from these organisms (the main 
hurdle being that the organelle is firmly integrated with other 
subcellular structures [O'Kelly 1993]). Similarly, genetic manip- 
ulations that would allow the exploration of gene function 
have not been established for any jakobid. 

Because the gene complement indicates that the mito- 
chondrial genome of Andalucia is the most slowly evolving 
of all mtDNAs, we expect as well that its nuclear genome 
has preserved primitive features. Therefore, we recently 
initiated a collaborative Andalucia nuclear genome project. 
With a complete nuclear gene complement becoming avail- 
able, it will be worthwhile to establish Andalucia as a eukary- 
otic model organism for biochemical studies, because it holds 
the promise of opening a window on the early evolution 
of cellular components, biological processes, and molecular 
functions of the eukaryotic cell. 

Supplementary Material 

Supplementary figures S1-S4 and tables S1-S6 are available 
at Genome Biology and Evolution online (http:/A/vww.gbe. 
oxfordjournals.org/). 
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